AITopics | visual story

Collaborating Authors

visual story

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

KAHANI: Culturally-Nuanced Visual Storytelling Pipeline for Non-Western Cultures

Hamna, null, Sudharsan, Deepthi, Seth, Agrima, Budhiraja, Ritvik, Khullar, Deepika, Jain, Vyshak, Bali, Kalika, Vashistha, Aditya, Segal, Sameer

arXiv.org Artificial IntelligenceOct-28-2024

Large Language Models (LLMs) and Text-To-Image (T2I) models have demonstrated the ability to generate compelling text and visual stories. However, their outputs are predominantly aligned with the sensibilities of the Global North, often resulting in an outsider's gaze on other cultures. As a result, non-Western communities have to put extra effort into generating culturally specific stories. To address this challenge, we developed a visual storytelling pipeline called KAHANI that generates culturally grounded visual stories for non-Western cultures. Our pipeline leverages off-the-shelf models GPT-4 Turbo and Stable Diffusion XL (SDXL). By using Chain of Thought (CoT) and T2I prompting techniques, we capture the cultural context from user's prompt and generate vivid descriptions of the characters and scene compositions. To evaluate the effectiveness of KAHANI, we conducted a comparative user study with ChatGPT-4 (with DALL-E3) in which participants from different regions of India compared the cultural relevance of stories generated by the two tools. Results from the qualitative and quantitative analysis performed on the user study showed that KAHANI was able to capture and incorporate more Culturally Specific Items (CSIs) compared to ChatGPT-4. In terms of both its cultural competence and visual story generation quality, our pipeline outperformed ChatGPT-4 in 27 out of the 36 comparisons.

dalhousie, participant, pipeline, (15 more...)

arXiv.org Artificial Intelligence

2410.19419

Country:

Asia > India > Tamil Nadu > Chennai (0.04)
Asia > Indonesia > Bali (0.04)
Asia > India > Gujarat (0.04)
(5 more...)

Genre:

Research Report > New Finding (1.00)
Questionnaire & Opinion Survey (1.00)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Metamorpheus: Interactive, Affective, and Creative Dream Narration Through Metaphorical Visual Storytelling

Wan, Qian, Feng, Xin, Bei, Yining, Gao, Zhiqi, Lu, Zhicong

arXiv.org Artificial IntelligenceMar-1-2024

Human emotions are essentially molded by lived experiences, from which we construct personalised meaning. The engagement in such meaning-making process has been practiced as an intervention in various psychotherapies to promote wellness. Nevertheless, to support recollecting and recounting lived experiences in everyday life remains under explored in HCI. It also remains unknown how technologies such as generative AI models can facilitate the meaning making process, and ultimately support affective mindfulness. In this paper we present Metamorpheus, an affective interface that engages users in a creative visual storytelling of emotional experiences during dreams. Metamorpheus arranges the storyline based on a dream's emotional arc, and provokes self-reflection through the creation of metaphorical images and text depictions. The system provides metaphor suggestions, and generates visual metaphors and text depictions using generative AI models, while users can apply generations to recolour and re-arrange the interface to be visually affective. Our experience-centred evaluation manifests that, by interacting with Metamorpheus, users can recall their dreams in vivid detail, through which they relive and reflect upon their experiences in a meaningful way.

emotion, metamorpheus, participant, (15 more...)

arXiv.org Artificial Intelligence

2403.00632

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > Hawaii > Honolulu County > Honolulu (0.05)
Asia > China > Hong Kong (0.04)
(6 more...)

Genre:

Questionnaire & Opinion Survey (1.00)
Personal > Interview (0.68)
Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Consumer Health (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Emotion (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.45)

Add feedback

Hands-on with Microsoft's Dall-E 2-based Bing Image Creator: It's good!

PCWorldMar-21-2023, 17:48:20 GMT

Today, Microsoft begins integrating AI art into its AI-powered Bing Chat chatbot with Bing Image Creator…and it's surprisingly good. Microsoft began previewing Image Creator last fall in select markets, and its generative AI art later became the foundation for Microsoft Designer, the excellent design application that also uses AI art to help create templates, flyers, and simple greeting cards. Today, Bing Image Creator will begin integrating with Bing Chat's textual chatbot, but also generate images at its own site, Bing.com/create . Put another way, that means that you'll be able to ask Bing's chatbot to create your own images from an integrated text prompt within Bing Chat, or else use the dedicated site. There's a third option, too: Use the new Edge Copilot sidebar within Microsoft Edge, which has been used for textual generation via AI.

bing image creator, image creator, microsoft, (12 more...)

PCWorld

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.99)

Add feedback

Integrating Visuospatial, Linguistic and Commonsense Structure into Story Visualization

Maharana, Adyasha, Bansal, Mohit

arXiv.org Artificial IntelligenceOct-20-2021

While much research has been done in text-to-image synthesis, little work has been done to explore the usage of linguistic structure of the input text. Such information is even more important for story visualization since its inputs have an explicit narrative structure that needs to be translated into an image sequence (or visual story). Prior work in this domain has shown that there is ample room for improvement in the generated image sequence in terms of visual quality, consistency and relevance. In this paper, we first explore the use of constituency parse trees using a Transformer-based recurrent architecture for encoding structured input. Second, we augment the structured input with commonsense information and study the impact of this external knowledge on the generation of visual story. Third, we also incorporate visual structure via bounding boxes and dense captioning to provide feedback about the characters/objects in generated images within a dual learning setup. We show that off-the-shelf dense-captioning models trained on Visual Genome can improve the spatial structure of images from a different target domain without needing fine-tuning. We train the model end-to-end using intra-story contrastive loss (between words and image sub-regions) and show significant improvements in several metrics (and human evaluation) for multiple datasets. Finally, we provide an analysis of the linguistic and visuo-spatial information. Code and data: https://github.com/adymaharana/VLCStoryGan.

caption, proceedings, vlc-s tory gan, (14 more...)

arXiv.org Artificial Intelligence

2110.10834

Country:

North America > United States > North Carolina (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

deepsing: Generating Sentiment-aware Visual Stories using Cross-modal Music Translation

Passalis, Nikolaos, Doropoulos, Stavros

arXiv.org Artificial IntelligenceDec-11-2019

In this paper we propose a deep learning method for performing attributed-based music-to-image translation. The proposed method is applied for synthesizing visual stories according to the sentiment expressed by songs. The generated images aim to induce the same feelings to the viewers, as the original song does, reinforcing the primary aim of music, i.e., communicating feelings. The process of music-to-image translation poses unique challenges, mainly due to the unstable mapping between the different modalities involved in this process. In this paper, we employ a trainable cross-modal translation method to overcome this limitation, leading to the first, to the best of our knowledge, deep learning method for generating sentiment-aware visual stories. Various aspects of the proposed method are extensively evaluated and discussed using different songs.

sentiment, vector, visual story, (15 more...)

arXiv.org Artificial Intelligence

1912.05654

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Greece > Central Macedonia > Thessaloniki (0.04)
Asia > India (0.04)

Genre: Research Report (0.50)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Visual Story Post-Editing

Hsu, Ting-Yao, Huang, Chieh-Yang, Hsu, Yen-Chia, Huang, Ting-Hao 'Kenneth'

arXiv.org Artificial IntelligenceJun-4-2019

We introduce the first dataset for human edits of machine-generated visual stories and explore how these collected edits may be used for the visual story post-editing task. The dataset, VIST-Edit, includes 14,905 human edited versions of 2,981 machine-generated visual stories. The stories were generated by two state-of-the-art visual storytelling models, each aligned to 5 human-edited versions. We establish baselines for the task, showing how a relatively small set of human edits can be leveraged to boost the performance of large visual storytelling models. We also discuss the weak correlation between automatic evaluation scores and human ratings, motivating the need for new automatic metrics.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

1906.01764

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > Pennsylvania > Centre County > State College (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Getty Images Is Using Artificial Intelligence to Help Newsrooms Choose Better Photos

#artificialintelligenceAug-3-2018, 11:01:27 GMT

Getty Images is embracing artificial intelligence, starting with a way to help publishers pick photos. Today, the photo agency debuted a tool that uses AI to analyze a story and suggest photos that might go along with it depending on the text and content. The tool, called Panels, uses natural language processing--a term for how computers can learn to "read" human words, phrases and sentences--to then match a story based on keywords, images, captions and other criteria. Publishers also will then have access to custom filters and a self-improving algorithm to move around keywords or select images through a more human-driven process. Here's how it works: When someone enters in the URL for a story or copies and pastes in the text, Panels will analyze the words before suggesting people, places and things that appear in the story after weighing different options based on frequency and relevance.

artificial intelligence, getty image, natural language, (10 more...)

#artificialintelligence

Country: North America > United States > New York > New York County > New York City (0.05)

Industry:

Media > News (0.44)
Government > Voting & Elections (0.31)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback